Skip to content

Draft: feat(openshift): add standalone RHOAI deployment on any OCP cluster#853

Open
amastbau wants to merge 2 commits into
redhat-developer:mainfrom
amastbau:feat/openshift-rhoai-standalone
Open

Draft: feat(openshift): add standalone RHOAI deployment on any OCP cluster#853
amastbau wants to merge 2 commits into
redhat-developer:mainfrom
amastbau:feat/openshift-rhoai-standalone

Conversation

@amastbau

@amastbau amastbau commented Jul 3, 2026

Copy link
Copy Markdown

Summary

  • Extract profile system (RHOAI, ServiceMesh, Serverless, etc.) from pkg/target/service/snc/profile/ to pkg/provider/openshift/profile/ — the code had zero SNC dependencies, only the package path coupled it
  • Add new mapt openshift rhoai create/destroy command that deploys RHOAI on any existing OpenShift cluster given a kubeconfig path
  • Fix NetworkPolicy issue for KnativeServing webhook in mesh-enrolled namespaces (only surfaces on multi-node clusters)
  • SNC backward compatible — same code, new import path

Usage

# Deploy RHOAI (stable channel) with full dependency chain on any OCP cluster
mapt openshift rhoai create \
  --kubeconfig /path/to/kubeconfig \
  --project-name my-rhoai \
  --backed-url file:///tmp/mapt-state

# Deploy with specific profiles
mapt openshift rhoai create \
  --kubeconfig ~/.kube/config \
  --project-name my-rhoai \
  --backed-url file:///tmp/state \
  --profile ai,nvidia

# Tear down everything
mapt openshift rhoai destroy \
  --project-name my-rhoai \
  --backed-url file:///tmp/mapt-state

The --profile ai (default) installs the full RHOAI stack:

  1. Service Mesh v2 (Maistra) + Authorino
  2. OpenShift Serverless (KnativeServing)
  3. RHOAI operator (rhods-operator from stable channel)
  4. DataScienceCluster CR with KServe managed

Context

Supports epic AIPCC-19537 — adding an ITS to ensure RHAII builds work on stable RHOAI versions. This is the first step (task AIPCC-19538): enabling mapt to provision RHOAI from the stable channel on any OpenShift cluster.

E2E Test Results

Happy path tested on a real multi-node OCP 4.20 cluster (3 masters + 3 workers, IBM Cloud).

Note: This is a first pass — happy path passed once. More code review and testing is needed before merge.

Cluster

Server Version: 4.20.8 / Kubernetes v1.33.6
Nodes: 3 masters + 3 workers

Operators Installed (all Succeeded)

rhods-operator.2.25.8          Red Hat OpenShift AI              2.25.8    Succeeded
servicemeshoperator.v2.6.17    Red Hat OpenShift Service Mesh 2  2.6.17-0  Succeeded
serverless-operator.v1.37.1    Red Hat OpenShift Serverless      1.37.1    Succeeded
authorino-operator.v1.4.1      Authorino Operator                1.4.1     Succeeded

Prerequisites Ready

SMCP:             data-science-smcp   5/5   ComponentsReady   v2.6.17
KnativeServing:   knative-serving     1.17  True

DataScienceCluster Components

kserve: true          dashboard: true        codeflare: true
ray: true             workbenches: true      trustyai: true
data-science-pipelines-operator: true

RHOAI Pods (all Running)

codeflare-operator-manager                     1/1  Running
kserve-controller-manager                      1/1  Running
kuberay-operator                               1/1  Running
notebook-controller-deployment                 1/1  Running
odh-model-controller                           1/1  Running
odh-notebook-controller-manager                1/1  Running
rhods-dashboard (x2)                           3/3  Running
trustyai-service-operator-controller-manager   1/1  Running

Issues Found During Testing

  1. KnativeServing webhook timeout (fixed in this PR): On multi-node clusters, ServiceMesh's deny-all NetworkPolicy in knative-serving namespace blocks API server → webhook traffic. Added a NetworkPolicy to allow webhook ingress. This issue is masked on SNC where all traffic is node-local.

  2. DSC datasciencepipelines conflict (cluster-specific, not a mapt bug): The test cluster had pre-existing Argo Workflows CRDs, which conflict with the DSP operator. DSC reports NotReady for this component. KServe and all other components work correctly. Workaround: set datasciencepipelines to Removed in the DSC spec, or remove existing Argo CRDs.

Test plan

  • make build passes
  • make test passes (all existing tests, SNC unaffected)
  • mapt openshift rhoai create --help shows correct flags
  • mapt openshift rhoai destroy --help shows correct flags
  • E2E: deployed RHOAI on real multi-node OCP 4.20 cluster — all operators Succeeded, KServe running, dashboard running
  • E2E: mapt openshift rhoai destroy teardown (not yet tested)
  • E2E: test on a clean cluster without pre-existing Argo CRDs

🤖 Generated with Claude Code

Extract the profile system (RHOAI, ServiceMesh, Serverless, etc.) from
pkg/target/service/snc/profile/ into pkg/provider/openshift/profile/ so
it can be used independently of SNC. The profile code had zero SNC
dependencies — only the package path coupled it.

Add new `mapt openshift rhoai create/destroy` command that deploys RHOAI
on any existing OpenShift cluster given a kubeconfig path. This supports
the AIPCC-19537 epic for RHAII integration testing on stable RHOAI.

Usage:
  mapt openshift rhoai create --kubeconfig <path> --project-name <name> --backed-url <url>
  mapt openshift rhoai destroy --project-name <name> --backed-url <url>

Profiles default to "ai" which installs the full RHOAI stack (ServiceMesh
v2, Serverless, RHOAI operator, DataScienceCluster with KServe).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a new OpenShift provider with a RHOAI Pulumi-based stack action supporting Create and Destroy operations, along with new openshift rhoai create/destroy CLI commands wired into the root command. The profile package's import path was relocated and its error message updated accordingly.

Changes

OpenShift RHOAI feature

Layer / File(s) Summary
OpenShift provider foundation
pkg/provider/openshift/openshift.go
Adds OpenShift type with stubbed Init/DefaultHostingPlace, Provider() factory, NoCredentials, and DestroyStack delegating to manager.DestroyStack.
RHOAI stack action
pkg/provider/openshift/action/rhoai/rhoai.go
Adds RHOAIArgs, Create/Destroy functions, deploy() building a manager.Stack, and pulumiProgram() reading kubeconfig and invoking profile.Deploy.
CLI commands and root wiring
cmd/mapt/cmd/openshift/openshift.go, cmd/mapt/cmd/openshift/rhoai.go, cmd/mapt/cmd/root.go
Adds openshift root command with rhoai subcommand exposing create/destroy, and registers the command in rootCmd.
Profile package relocation
cmd/mapt/cmd/aws/services/snc.go, pkg/provider/aws/action/snc/snc.go, pkg/provider/openshift/profile/profile.go
Updates import paths from target/service/snc/profile to provider/openshift/profile and revises the unsupported-profile error message.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
  participant CLI as rhoai create
  participant Action as rhoaiAction.Create
  participant Deploy as rhoaiRequest.deploy
  participant Pulumi as pulumiProgram
  participant Manager as manager.UpStack

  CLI->>Action: RHOAIArgs(kubeconfig, profiles, prefix)
  Action->>Action: profile.Validate(profiles)
  Action->>Deploy: deploy()
  Deploy->>Manager: UpStack(stack config, pulumiProgram)
  Manager->>Pulumi: run pulumiProgram
  Pulumi->>Pulumi: read kubeconfig, create k8s provider
  Pulumi->>Pulumi: profile.Deploy(profiles, provider, kubeconfig, prefix)
Loading
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly summarizes the main change: standalone RHOAI deployment on OpenShift clusters.
Description check ✅ Passed The description matches the changeset by describing the OpenShift RHOAI command and profile package extraction.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/provider/openshift/action/rhoai/rhoai.go (1)

20-24: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

RHOAIArgs.Prefix isn't wired to any CLI flag.

The field exists and Create defaults it when empty, but cmd/mapt/cmd/openshift/rhoai.go's createRHOAI never populates it from a flag, so users have no way to override the default prefix.

Also applies to: 41-51

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/provider/openshift/action/rhoai/rhoai.go` around lines 20 - 24, The
RHOAIArgs.Prefix field is never populated from the CLI, so users cannot override
the default prefix. Update the OpenShift RHOAI command wiring in createRHOAI to
add a flag for the prefix and pass its value into RHOAIArgs, then ensure
RHOAIArgs.Create still falls back to the default only when the flag is unset.
Use the RHOAIArgs and createRHOAI symbols to locate the affected flow.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/provider/openshift/action/rhoai/rhoai.go`:
- Around line 20-24: The RHOAIArgs.Prefix field is never populated from the CLI,
so users cannot override the default prefix. Update the OpenShift RHOAI command
wiring in createRHOAI to add a flag for the prefix and pass its value into
RHOAIArgs, then ensure RHOAIArgs.Create still falls back to the default only
when the flag is unset. Use the RHOAIArgs and createRHOAI symbols to locate the
affected flow.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 43b9713c-a8be-4745-82e1-3c6dced8bf93

📥 Commits

Reviewing files that changed from the base of the PR and between 20fee32 and 360527f.

📒 Files selected for processing (17)
  • cmd/mapt/cmd/aws/services/snc.go
  • cmd/mapt/cmd/openshift/openshift.go
  • cmd/mapt/cmd/openshift/rhoai.go
  • cmd/mapt/cmd/root.go
  • pkg/provider/aws/action/snc/snc.go
  • pkg/provider/openshift/action/rhoai/rhoai.go
  • pkg/provider/openshift/openshift.go
  • pkg/provider/openshift/profile/client.go
  • pkg/provider/openshift/profile/nfd.go
  • pkg/provider/openshift/profile/nvidia.go
  • pkg/provider/openshift/profile/openshift_ai.go
  • pkg/provider/openshift/profile/operator.go
  • pkg/provider/openshift/profile/profile.go
  • pkg/provider/openshift/profile/serverless.go
  • pkg/provider/openshift/profile/servicemesh.go
  • pkg/provider/openshift/profile/servicemesh_v2.go
  • pkg/provider/openshift/profile/virtualization.go

@amastbau amastbau changed the title feat(openshift): add standalone RHOAI deployment on any OCP cluster Draft: feat(openshift): add standalone RHOAI deployment on any OCP cluster Jul 3, 2026
… namespaces

When ServiceMesh enrolls knative-serving via SMMR, it creates a deny-all
NetworkPolicy that blocks API server -> webhook traffic on multi-node
clusters. This causes KnativeServing install to fail with webhook timeout
errors. On SNC this was masked because all traffic is node-local.

Add a NetworkPolicy allowing ingress to webhook pods on port 8443 before
creating the Knative CR, so admission webhooks remain reachable regardless
of mesh network policies.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants